Lesson 1: Working with Gridded Spatial Data in Python¶

Objective: Introduce packages for working with gridded spatial data in Python and learn how to use these to manipulate spatial data. We will work with multidimensional gridded data in xarray and perform geospatial operations on xarrays using rioxarray

Step 1. Load the necessary libraries¶

In [1]:
import fsspec #connecting to data on aws

import warnings #dont print warnings
warnings.filterwarnings('ignore')

import xarray as xr #for gridded data
import numpy as np #for arrays in python

from dask.diagnostics import ProgressBar #progress bar

Step 2. Gridded data with xarray¶

Xarray is a package for working with multidimensional gridded data in Python. While the package numpy provides many of the core operations we need for working with gridded data like indexing matrix operations, etc it does not provide the functionality to add information about the various dimensions of arrays, the coordinates of grid cells, or attached important metadata. This is where xarray comes in.

By including labels on array dimensions xarray opens up many new possibilities:

  • applying operations over dimensions by name: x.sum('time').

  • selecting values by label x.sel(time='2014-01-01').

  • use the split-apply-combine paradigm with groupby: x.groupby('time.dayofyear').mean().

  • keeping track of arbitrary metadata in the form of a Python dictionary: x.attrs.

  • and much more

The xarray data structure makes it trivial to go from 2 to 3 to 4 to N dimensions, hence it is a great choice for working with gridded data where we have at least 3 (lat, lon, time) dimensions. Another big benefit is that it seamlessly integrates with Dask a popular library for parallel computing in Python. This allows us to scale analysis with xarray to very large data.

The core data structure of xarray is an xarray.DataArray - which in its simplest form is just a Numpy array with named dimensions and coordinates on those dimensions. We can combine multiple xarray.DataArray in a single structure called a xarray.Dataset. Let's see what this looks like

In [2]:
#create a 2x3 np array
arr = np.array([[1,2,3],[5,6,7]])

#create a xarray.DataArray by naming the dims and giving them coordinates
xda = xr.DataArray(arr,
                    dims=("x", "y"),
                    coords={"x": [10, 20],
                            "y": [1.1,1.2,1.3]})

xda
Out[2]:
<xarray.DataArray (x: 2, y: 3)> Size: 48B
array([[1, 2, 3],
       [5, 6, 7]])
Coordinates:
  * x        (x) int64 16B 10 20
  * y        (y) float64 24B 1.1 1.2 1.3
xarray.DataArray
  • x: 2
  • y: 3
  • 1 2 3 5 6 7
    array([[1, 2, 3],
           [5, 6, 7]])
    • x
      (x)
      int64
      10 20
      array([10, 20])
    • y
      (y)
      float64
      1.1 1.2 1.3
      array([1.1, 1.2, 1.3])

We can access the individual components like the data itself, the dimension names or the coordinates using accessors

In [3]:
#get the underlying array/matrix
print(xda.values)

#get the dimension names
print(xda.dims)

#get the x coordinates
print(xda.coords['x'])
[[1 2 3]
 [5 6 7]]
('x', 'y')
<xarray.DataArray 'x' (x: 2)> Size: 16B
array([10, 20])
Coordinates:
  * x        (x) int64 16B 10 20

We can set or get any metadata attribute we like

In [4]:
xda.attrs["long_name"] = "random measurement"
xda.attrs["random_attribute"] = 123

print(xda.attrs)
{'long_name': 'random measurement', 'random_attribute': 123}

and perform calculations on xarray.DataArrays as if they were numpy arrays

In [5]:
xda + 10
Out[5]:
<xarray.DataArray (x: 2, y: 3)> Size: 48B
array([[11, 12, 13],
       [15, 16, 17]])
Coordinates:
  * x        (x) int64 16B 10 20
  * y        (y) float64 24B 1.1 1.2 1.3
Attributes:
    long_name:         random measurement
    random_attribute:  123
xarray.DataArray
  • x: 2
  • y: 3
  • 11 12 13 15 16 17
    array([[11, 12, 13],
           [15, 16, 17]])
    • x
      (x)
      int64
      10 20
      array([10, 20])
    • y
      (y)
      float64
      1.1 1.2 1.3
      array([1.1, 1.2, 1.3])
  • long_name :
    random measurement
    random_attribute :
    123
In [6]:
np.sin(xda)
Out[6]:
<xarray.DataArray (x: 2, y: 3)> Size: 48B
array([[ 0.84147098,  0.90929743,  0.14112001],
       [-0.95892427, -0.2794155 ,  0.6569866 ]])
Coordinates:
  * x        (x) int64 16B 10 20
  * y        (y) float64 24B 1.1 1.2 1.3
Attributes:
    long_name:         random measurement
    random_attribute:  123
xarray.DataArray
  • x: 2
  • y: 3
  • 0.8415 0.9093 0.1411 -0.9589 -0.2794 0.657
    array([[ 0.84147098,  0.90929743,  0.14112001],
           [-0.95892427, -0.2794155 ,  0.6569866 ]])
    • x
      (x)
      int64
      10 20
      array([10, 20])
    • y
      (y)
      float64
      1.1 1.2 1.3
      array([1.1, 1.2, 1.3])
  • long_name :
    random measurement
    random_attribute :
    123

An xarray.Dataset is a container of multiple aligned DataArray objects

In [7]:
#create a new dataarray with aligned dimensions (but it can be more or fewer dims)
#create a new 2x3x4 xarray Dataarray
arr2 = np.random.randn(2, 3, 4)
xda2 = xr.DataArray(arr2,
                    dims=("x", "y","z"),
                    coords={"x": [10, 20],
                            "y": [1.1,1.2,1.3],
                            "z": [20,200,2000,20000]})

#combine with another xarray.DataArray to make a xarray.Dataset
xds = xr.Dataset({'foo':xda,'bar':xda2})
xds
Out[7]:
<xarray.Dataset> Size: 312B
Dimensions:  (x: 2, y: 3, z: 4)
Coordinates:
  * x        (x) int64 16B 10 20
  * y        (y) float64 24B 1.1 1.2 1.3
  * z        (z) int64 32B 20 200 2000 20000
Data variables:
    foo      (x, y) int64 48B 1 2 3 5 6 7
    bar      (x, y, z) float64 192B 0.3587 0.8288 0.7981 ... -0.2103 -0.05131
xarray.Dataset
    • x: 2
    • y: 3
    • z: 4
    • x
      (x)
      int64
      10 20
      array([10, 20])
    • y
      (y)
      float64
      1.1 1.2 1.3
      array([1.1, 1.2, 1.3])
    • z
      (z)
      int64
      20 200 2000 20000
      array([   20,   200,  2000, 20000])
    • foo
      (x, y)
      int64
      1 2 3 5 6 7
      long_name :
      random measurement
      random_attribute :
      123
      array([[1, 2, 3],
             [5, 6, 7]])
    • bar
      (x, y, z)
      float64
      0.3587 0.8288 ... -0.2103 -0.05131
      array([[[ 0.35869376,  0.82876575,  0.79809663,  0.32309751],
              [ 0.80843383,  0.43132193, -2.01657843, -0.43000353],
              [-0.12717552,  0.10460156, -1.08534725,  2.08804189]],
      
             [[-0.99132428, -0.38778177,  0.10088225,  1.56476149],
              [ 0.51384279,  1.00109907, -1.18443874, -0.94131117],
              [ 2.21284397,  0.37209953, -0.21029917, -0.05130576]]])

Here you can see that we have multiple arrays in a single dataset. Xarray automatically aligns the arrays based on shared dimensions and coodrinates. You can do almost everything you can do with DataArray objects with Dataset objects (including indexing and arithmetic) if you prefer to work with multiple variables at once. You can also easily retrieve a single DataArray by name from a Dataset

In [8]:
xds.foo
# xds['foo'] works the same
Out[8]:
<xarray.DataArray 'foo' (x: 2, y: 3)> Size: 48B
array([[1, 2, 3],
       [5, 6, 7]])
Coordinates:
  * x        (x) int64 16B 10 20
  * y        (y) float64 24B 1.1 1.2 1.3
Attributes:
    long_name:         random measurement
    random_attribute:  123
xarray.DataArray
'foo'
  • x: 2
  • y: 3
  • 1 2 3 5 6 7
    array([[1, 2, 3],
           [5, 6, 7]])
    • x
      (x)
      int64
      10 20
      array([10, 20])
    • y
      (y)
      float64
      1.1 1.2 1.3
      array([1.1, 1.2, 1.3])
  • long_name :
    random measurement
    random_attribute :
    123

Terminology¶

It is important to be precise with our terminology when dealing with Xarrays as things can quickly get confusing when working with many dims. The full glossary can be found here, but a quick recap:

  • xarray.DataArray - A multi-dimensional array with labeled or named dimensions
  • xarray.Dataset - A collection of DataArrays with aligned dimensions
  • Dimension - The (named) axes of an array
  • Coordinate - An array that labels a dimension

xarray

Step 3. Loading data from the cloud¶

Xarray supports reading and writing of several file formats, from simple Pickle files to the more flexible netCDF format, and the cloud-optimized zarr format. When we are working with complex multidimensional data, file formats start to matter a lot, and they make a big difference to how fast and efficiently we can load and analyse data. More on this in the next lesson.

We can load files to create a new Dataset using open_dataset(). Similarly, a DataArray can be saved to disk using the DataArray.to_netcdf() ot DataArray.to_zarr() method.

We can easily work with datasets stored on our local hard drive using xarray, but we are limited by two key constraints:

  1. The data must fit on our hard disk.
  2. The data must fit in our system's RAM.

While this is sufficient for many tasks, it imposes significant limitations on the size of the data we can handle. For example, if we need to analyze multiple satellite images or large datasets from climate models, we may quickly reach these limits.

Cloud-based data analysis offers a solution to these constraints. Cloud storage is not only cost-effective but also infinitely scalable. Additionally, we can dynamically scale up the compute power—such as increasing the amount of RAM or the number of CPUs—when required for resource-intensive tasks. This model of connecting scalable compute with virtually unlimited cloud data storage opens up new possibilities for working with large, gridded datasets. It is also very costs effective. Processing 1TB of data can cost in the order of 0.1 USD.

Amazon Web Services (AWS) is one such cloud platform that facilitates this model. AWS SageMaker provides scalable compute resources, while AWS S3 offers scalable storage.

In the following example, we will demonstrate how to connect to a dataset stored in S3 and open it using xarray. The dataset we will use is satellite-derived Sea Surface Temperature

In [9]:
#correctly format the path to the data on AWS s3
s3path = fsspec.get_mapper('s3://mur-sst/zarr', anon=True)

#open data
ds_sst = xr.open_dataset(s3path, 
                         engine='zarr', 
                         chunks='auto')

ds_sst
Out[9]:
<xarray.Dataset> Size: 104TB
Dimensions:           (time: 6443, lat: 17999, lon: 36000)
Coordinates:
  * time              (time) datetime64[ns] 52kB 2002-06-01T09:00:00 ... 2020...
  * lat               (lat) float32 72kB -89.99 -89.98 -89.97 ... 89.98 89.99
  * lon               (lon) float32 144kB -180.0 -180.0 -180.0 ... 180.0 180.0
Data variables:
    analysed_sst      (time, lat, lon) float64 33TB dask.array<chunksize=(4225, 63, 63), meta=np.ndarray>
    analysis_error    (time, lat, lon) float64 33TB dask.array<chunksize=(4225, 63, 63), meta=np.ndarray>
    mask              (time, lat, lon) int8 4TB dask.array<chunksize=(6443, 100, 100), meta=np.ndarray>
    sea_ice_fraction  (time, lat, lon) float64 33TB dask.array<chunksize=(4225, 63, 63), meta=np.ndarray>
Attributes: (12/47)
    Conventions:                CF-1.7
    Metadata_Conventions:       Unidata Observation Dataset v1.0
    acknowledgment:             Please acknowledge the use of these data with...
    cdm_data_type:              grid
    comment:                    MUR = "Multi-scale Ultra-high Resolution"
    creator_email:              ghrsst@podaac.jpl.nasa.gov
    ...                         ...
    summary:                    A merged, multi-sensor L4 Foundation SST anal...
    time_coverage_end:          20200116T210000Z
    time_coverage_start:        20200115T210000Z
    title:                      Daily MUR SST, Final product
    uuid:                       27665bc0-d5fc-11e1-9b23-0800200c9a66
    westernmost_longitude:      -180.0
xarray.Dataset
    • time: 6443
    • lat: 17999
    • lon: 36000
    • time
      (time)
      datetime64[ns]
      2002-06-01T09:00:00 ... 2020-01-...
      axis :
      T
      comment :
      Nominal time of analyzed fields
      long_name :
      reference time of sst field
      standard_name :
      time
      array(['2002-06-01T09:00:00.000000000', '2002-06-02T09:00:00.000000000',
             '2002-06-03T09:00:00.000000000', ..., '2020-01-18T09:00:00.000000000',
             '2020-01-19T09:00:00.000000000', '2020-01-20T09:00:00.000000000'],
            shape=(6443,), dtype='datetime64[ns]')
    • lat
      (lat)
      float32
      -89.99 -89.98 ... 89.98 89.99
      axis :
      Y
      comment :
      none
      long_name :
      latitude
      standard_name :
      latitude
      units :
      degrees_north
      valid_max :
      90.0
      valid_min :
      -90.0
      array([-89.99, -89.98, -89.97, ...,  89.97,  89.98,  89.99],
            shape=(17999,), dtype=float32)
    • lon
      (lon)
      float32
      -180.0 -180.0 ... 180.0 180.0
      axis :
      X
      comment :
      none
      long_name :
      longitude
      standard_name :
      longitude
      units :
      degrees_east
      valid_max :
      180.0
      valid_min :
      -180.0
      array([-179.99, -179.98, -179.97, ...,  179.98,  179.99,  180.  ],
            shape=(36000,), dtype=float32)
    • analysed_sst
      (time, lat, lon)
      float64
      dask.array<chunksize=(4225, 63, 63), meta=np.ndarray>
      comment :
      "Final" version using Multi-Resolution Variational Analysis (MRVA) method for interpolation
      long_name :
      analysed sea surface temperature
      standard_name :
      sea_surface_foundation_temperature
      units :
      kelvin
      valid_max :
      32767
      valid_min :
      -32767
      Array Chunk
      Bytes 30.38 TiB 127.94 MiB
      Shape (6443, 17999, 36000) (4225, 63, 63)
      Dask graph 327184 chunks in 2 graph layers
      Data type float64 numpy.ndarray
      36000 17999 6443
    • analysis_error
      (time, lat, lon)
      float64
      dask.array<chunksize=(4225, 63, 63), meta=np.ndarray>
      comment :
      none
      long_name :
      estimated error standard deviation of analysed_sst
      units :
      kelvin
      valid_max :
      32767
      valid_min :
      0
      Array Chunk
      Bytes 30.38 TiB 127.94 MiB
      Shape (6443, 17999, 36000) (4225, 63, 63)
      Dask graph 327184 chunks in 2 graph layers
      Data type float64 numpy.ndarray
      36000 17999 6443
    • mask
      (time, lat, lon)
      int8
      dask.array<chunksize=(6443, 100, 100), meta=np.ndarray>
      comment :
      mask can be used to further filter the data.
      flag_masks :
      [1, 2, 4, 8, 16]
      flag_meanings :
      1=open-sea, 2=land, 5=open-lake, 9=open-sea with ice in the grid, 13=open-lake with ice in the grid
      flag_values :
      [1, 2, 5, 9, 13]
      long_name :
      sea/land field composite mask
      source :
      GMT "grdlandmask", ice flag from sea_ice_fraction data
      valid_max :
      31
      valid_min :
      1
      Array Chunk
      Bytes 3.80 TiB 61.45 MiB
      Shape (6443, 17999, 36000) (6443, 100, 100)
      Dask graph 64800 chunks in 2 graph layers
      Data type int8 numpy.ndarray
      36000 17999 6443
    • sea_ice_fraction
      (time, lat, lon)
      float64
      dask.array<chunksize=(4225, 63, 63), meta=np.ndarray>
      comment :
      ice data interpolated by a nearest neighbor approach.
      long_name :
      sea ice area fraction
      source :
      EUMETSAT OSI-SAF, copyright EUMETSAT
      standard_name :
      sea ice area fraction
      units :
      fraction (between 0 and 1)
      valid_max :
      100
      valid_min :
      0
      Array Chunk
      Bytes 30.38 TiB 127.94 MiB
      Shape (6443, 17999, 36000) (4225, 63, 63)
      Dask graph 327184 chunks in 2 graph layers
      Data type float64 numpy.ndarray
      36000 17999 6443
  • Conventions :
    CF-1.7
    Metadata_Conventions :
    Unidata Observation Dataset v1.0
    acknowledgment :
    Please acknowledge the use of these data with the following statement: These data were provided by JPL under support by NASA MEaSUREs program.
    cdm_data_type :
    grid
    comment :
    MUR = "Multi-scale Ultra-high Resolution"
    creator_email :
    ghrsst@podaac.jpl.nasa.gov
    creator_name :
    JPL MUR SST project
    creator_url :
    http://mur.jpl.nasa.gov
    date_created :
    20200124T010755Z
    easternmost_longitude :
    180.0
    file_quality_level :
    3
    gds_version_id :
    2.0
    geospatial_lat_resolution :
    0.009999999776482582
    geospatial_lat_units :
    degrees north
    geospatial_lon_resolution :
    0.009999999776482582
    geospatial_lon_units :
    degrees east
    history :
    created at nominal 4-day latency; replaced nrt (1-day latency) version.
    id :
    MUR-JPL-L4-GLOB-v04.1
    institution :
    Jet Propulsion Laboratory
    keywords :
    Oceans > Ocean Temperature > Sea Surface Temperature
    keywords_vocabulary :
    NASA Global Change Master Directory (GCMD) Science Keywords
    license :
    These data are available free of charge under data policy of JPL PO.DAAC.
    metadata_link :
    http://podaac.jpl.nasa.gov/ws/metadata/dataset/?format=iso&shortName=MUR-JPL-L4-GLOB-v04.1
    naming_authority :
    org.ghrsst
    netcdf_version_id :
    4.1
    northernmost_latitude :
    90.0
    platform :
    Terra, Aqua, GCOM-W, MetOp-A, MetOp-B, Buoys/Ships
    processing_level :
    L4
    product_version :
    04.1
    project :
    NASA Making Earth Science Data Records for Use in Research Environments (MEaSUREs) Program
    publisher_email :
    ghrsst-po@nceo.ac.uk
    publisher_name :
    GHRSST Project Office
    publisher_url :
    http://www.ghrsst.org
    references :
    http://podaac.jpl.nasa.gov/Multi-scale_Ultra-high_Resolution_MUR-SST
    sensor :
    MODIS, AMSR2, AVHRR, in-situ
    source :
    MODIS_T-JPL, MODIS_A-JPL, AMSR2-REMSS, AVHRRMTA_G-NAVO, AVHRRMTB_G-NAVO, iQUAM-NOAA/NESDIS, Ice_Conc-OSISAF
    southernmost_latitude :
    -90.0
    spatial_resolution :
    0.01 degrees
    standard_name_vocabulary :
    NetCDF Climate and Forecast (CF) Metadata Convention
    start_time :
    20200116T090000Z
    stop_time :
    20200116T090000Z
    summary :
    A merged, multi-sensor L4 Foundation SST analysis product from JPL.
    time_coverage_end :
    20200116T210000Z
    time_coverage_start :
    20200115T210000Z
    title :
    Daily MUR SST, Final product
    uuid :
    27665bc0-d5fc-11e1-9b23-0800200c9a66
    westernmost_longitude :
    -180.0

Chunks?¶

When opening our data we can specify that we want the data split into chunks along each dimension

What does this do, and why should we do it?¶

If you don't specify that you want the dataset chunked, xarray will load all the data into a numpy array. This can be okay if you are working witha small dataset but as your data grows larger chunking has a number of advantages:

  • Efficient Memory Usage Without chunking, xarray loads the entire dataset into memory as NumPy arrays, which can use a lot of RAM and may cause your system to slow down or crash. Chunking splits the data into smaller pieces, allowing you to work with datasets that are bigger than your available memory by loading only what you need.

  • Better Performance Processing smaller chunks can speed up computations and make data handling more efficient. Data is loaded into memory only when required, reducing unnecessary memory usage and improving processing speed.

Default chunking and rechunking¶

Some file types like netCDF, zarr or cloud-optimized geotiff have native chunking, and it is usually most efficient to use the chunking that is already present. If you specify chunks='auto' chunking will be automatically determined. This is a major advantage as chunking/rechunking can be expensive for large files. The downside is that you are subject to the chunking chosen by the creator of the file.

Checkout the dask documentation on chunks to find out more about chunking .

Indexing, selecting and masking¶

While you can use numpy-like indexing e.g da[:,:], this does not make use of the power of having named dims and coords. Xarrayas specific method for selecting using the position in the array .isel() and using the coordinates with .sel()

In [10]:
#idexing using position
ds_sst.isel(lon=20,lat=20)
Out[10]:
<xarray.Dataset> Size: 213kB
Dimensions:           (time: 6443)
Coordinates:
  * time              (time) datetime64[ns] 52kB 2002-06-01T09:00:00 ... 2020...
    lat               float32 4B -89.79
    lon               float32 4B -179.8
Data variables:
    analysed_sst      (time) float64 52kB dask.array<chunksize=(4225,), meta=np.ndarray>
    analysis_error    (time) float64 52kB dask.array<chunksize=(4225,), meta=np.ndarray>
    mask              (time) int8 6kB dask.array<chunksize=(6443,), meta=np.ndarray>
    sea_ice_fraction  (time) float64 52kB dask.array<chunksize=(4225,), meta=np.ndarray>
Attributes: (12/47)
    Conventions:                CF-1.7
    Metadata_Conventions:       Unidata Observation Dataset v1.0
    acknowledgment:             Please acknowledge the use of these data with...
    cdm_data_type:              grid
    comment:                    MUR = "Multi-scale Ultra-high Resolution"
    creator_email:              ghrsst@podaac.jpl.nasa.gov
    ...                         ...
    summary:                    A merged, multi-sensor L4 Foundation SST anal...
    time_coverage_end:          20200116T210000Z
    time_coverage_start:        20200115T210000Z
    title:                      Daily MUR SST, Final product
    uuid:                       27665bc0-d5fc-11e1-9b23-0800200c9a66
    westernmost_longitude:      -180.0
xarray.Dataset
    • time: 6443
    • time
      (time)
      datetime64[ns]
      2002-06-01T09:00:00 ... 2020-01-...
      axis :
      T
      comment :
      Nominal time of analyzed fields
      long_name :
      reference time of sst field
      standard_name :
      time
      array(['2002-06-01T09:00:00.000000000', '2002-06-02T09:00:00.000000000',
             '2002-06-03T09:00:00.000000000', ..., '2020-01-18T09:00:00.000000000',
             '2020-01-19T09:00:00.000000000', '2020-01-20T09:00:00.000000000'],
            shape=(6443,), dtype='datetime64[ns]')
    • lat
      ()
      float32
      -89.79
      axis :
      Y
      comment :
      none
      long_name :
      latitude
      standard_name :
      latitude
      units :
      degrees_north
      valid_max :
      90.0
      valid_min :
      -90.0
      array(-89.79, dtype=float32)
    • lon
      ()
      float32
      -179.8
      axis :
      X
      comment :
      none
      long_name :
      longitude
      standard_name :
      longitude
      units :
      degrees_east
      valid_max :
      180.0
      valid_min :
      -180.0
      array(-179.79, dtype=float32)
    • analysed_sst
      (time)
      float64
      dask.array<chunksize=(4225,), meta=np.ndarray>
      comment :
      "Final" version using Multi-Resolution Variational Analysis (MRVA) method for interpolation
      long_name :
      analysed sea surface temperature
      standard_name :
      sea_surface_foundation_temperature
      units :
      kelvin
      valid_max :
      32767
      valid_min :
      -32767
      Array Chunk
      Bytes 50.34 kiB 33.01 kiB
      Shape (6443,) (4225,)
      Dask graph 2 chunks in 3 graph layers
      Data type float64 numpy.ndarray
      6443 1
    • analysis_error
      (time)
      float64
      dask.array<chunksize=(4225,), meta=np.ndarray>
      comment :
      none
      long_name :
      estimated error standard deviation of analysed_sst
      units :
      kelvin
      valid_max :
      32767
      valid_min :
      0
      Array Chunk
      Bytes 50.34 kiB 33.01 kiB
      Shape (6443,) (4225,)
      Dask graph 2 chunks in 3 graph layers
      Data type float64 numpy.ndarray
      6443 1
    • mask
      (time)
      int8
      dask.array<chunksize=(6443,), meta=np.ndarray>
      comment :
      mask can be used to further filter the data.
      flag_masks :
      [1, 2, 4, 8, 16]
      flag_meanings :
      1=open-sea, 2=land, 5=open-lake, 9=open-sea with ice in the grid, 13=open-lake with ice in the grid
      flag_values :
      [1, 2, 5, 9, 13]
      long_name :
      sea/land field composite mask
      source :
      GMT "grdlandmask", ice flag from sea_ice_fraction data
      valid_max :
      31
      valid_min :
      1
      Array Chunk
      Bytes 6.29 kiB 6.29 kiB
      Shape (6443,) (6443,)
      Dask graph 1 chunks in 3 graph layers
      Data type int8 numpy.ndarray
      6443 1
    • sea_ice_fraction
      (time)
      float64
      dask.array<chunksize=(4225,), meta=np.ndarray>
      comment :
      ice data interpolated by a nearest neighbor approach.
      long_name :
      sea ice area fraction
      source :
      EUMETSAT OSI-SAF, copyright EUMETSAT
      standard_name :
      sea ice area fraction
      units :
      fraction (between 0 and 1)
      valid_max :
      100
      valid_min :
      0
      Array Chunk
      Bytes 50.34 kiB 33.01 kiB
      Shape (6443,) (4225,)
      Dask graph 2 chunks in 3 graph layers
      Data type float64 numpy.ndarray
      6443 1
  • Conventions :
    CF-1.7
    Metadata_Conventions :
    Unidata Observation Dataset v1.0
    acknowledgment :
    Please acknowledge the use of these data with the following statement: These data were provided by JPL under support by NASA MEaSUREs program.
    cdm_data_type :
    grid
    comment :
    MUR = "Multi-scale Ultra-high Resolution"
    creator_email :
    ghrsst@podaac.jpl.nasa.gov
    creator_name :
    JPL MUR SST project
    creator_url :
    http://mur.jpl.nasa.gov
    date_created :
    20200124T010755Z
    easternmost_longitude :
    180.0
    file_quality_level :
    3
    gds_version_id :
    2.0
    geospatial_lat_resolution :
    0.009999999776482582
    geospatial_lat_units :
    degrees north
    geospatial_lon_resolution :
    0.009999999776482582
    geospatial_lon_units :
    degrees east
    history :
    created at nominal 4-day latency; replaced nrt (1-day latency) version.
    id :
    MUR-JPL-L4-GLOB-v04.1
    institution :
    Jet Propulsion Laboratory
    keywords :
    Oceans > Ocean Temperature > Sea Surface Temperature
    keywords_vocabulary :
    NASA Global Change Master Directory (GCMD) Science Keywords
    license :
    These data are available free of charge under data policy of JPL PO.DAAC.
    metadata_link :
    http://podaac.jpl.nasa.gov/ws/metadata/dataset/?format=iso&shortName=MUR-JPL-L4-GLOB-v04.1
    naming_authority :
    org.ghrsst
    netcdf_version_id :
    4.1
    northernmost_latitude :
    90.0
    platform :
    Terra, Aqua, GCOM-W, MetOp-A, MetOp-B, Buoys/Ships
    processing_level :
    L4
    product_version :
    04.1
    project :
    NASA Making Earth Science Data Records for Use in Research Environments (MEaSUREs) Program
    publisher_email :
    ghrsst-po@nceo.ac.uk
    publisher_name :
    GHRSST Project Office
    publisher_url :
    http://www.ghrsst.org
    references :
    http://podaac.jpl.nasa.gov/Multi-scale_Ultra-high_Resolution_MUR-SST
    sensor :
    MODIS, AMSR2, AVHRR, in-situ
    source :
    MODIS_T-JPL, MODIS_A-JPL, AMSR2-REMSS, AVHRRMTA_G-NAVO, AVHRRMTB_G-NAVO, iQUAM-NOAA/NESDIS, Ice_Conc-OSISAF
    southernmost_latitude :
    -90.0
    spatial_resolution :
    0.01 degrees
    standard_name_vocabulary :
    NetCDF Climate and Forecast (CF) Metadata Convention
    start_time :
    20200116T090000Z
    stop_time :
    20200116T090000Z
    summary :
    A merged, multi-sensor L4 Foundation SST analysis product from JPL.
    time_coverage_end :
    20200116T210000Z
    time_coverage_start :
    20200115T210000Z
    title :
    Daily MUR SST, Final product
    uuid :
    27665bc0-d5fc-11e1-9b23-0800200c9a66
    westernmost_longitude :
    -180.0

We can use all the same techniques, but provide coordinate values rather than positions if we use .sel(). We can also provide an option for what to do if we do not get an exact match to the provided coordinates.

In [11]:
ds_sst.sel(lon=-10,lat=-10,method='nearest')
Out[11]:
<xarray.Dataset> Size: 213kB
Dimensions:           (time: 6443)
Coordinates:
  * time              (time) datetime64[ns] 52kB 2002-06-01T09:00:00 ... 2020...
    lat               float32 4B -10.0
    lon               float32 4B -10.0
Data variables:
    analysed_sst      (time) float64 52kB dask.array<chunksize=(4225,), meta=np.ndarray>
    analysis_error    (time) float64 52kB dask.array<chunksize=(4225,), meta=np.ndarray>
    mask              (time) int8 6kB dask.array<chunksize=(6443,), meta=np.ndarray>
    sea_ice_fraction  (time) float64 52kB dask.array<chunksize=(4225,), meta=np.ndarray>
Attributes: (12/47)
    Conventions:                CF-1.7
    Metadata_Conventions:       Unidata Observation Dataset v1.0
    acknowledgment:             Please acknowledge the use of these data with...
    cdm_data_type:              grid
    comment:                    MUR = "Multi-scale Ultra-high Resolution"
    creator_email:              ghrsst@podaac.jpl.nasa.gov
    ...                         ...
    summary:                    A merged, multi-sensor L4 Foundation SST anal...
    time_coverage_end:          20200116T210000Z
    time_coverage_start:        20200115T210000Z
    title:                      Daily MUR SST, Final product
    uuid:                       27665bc0-d5fc-11e1-9b23-0800200c9a66
    westernmost_longitude:      -180.0
xarray.Dataset
    • time: 6443
    • time
      (time)
      datetime64[ns]
      2002-06-01T09:00:00 ... 2020-01-...
      axis :
      T
      comment :
      Nominal time of analyzed fields
      long_name :
      reference time of sst field
      standard_name :
      time
      array(['2002-06-01T09:00:00.000000000', '2002-06-02T09:00:00.000000000',
             '2002-06-03T09:00:00.000000000', ..., '2020-01-18T09:00:00.000000000',
             '2020-01-19T09:00:00.000000000', '2020-01-20T09:00:00.000000000'],
            shape=(6443,), dtype='datetime64[ns]')
    • lat
      ()
      float32
      -10.0
      axis :
      Y
      comment :
      none
      long_name :
      latitude
      standard_name :
      latitude
      units :
      degrees_north
      valid_max :
      90.0
      valid_min :
      -90.0
      array(-10., dtype=float32)
    • lon
      ()
      float32
      -10.0
      axis :
      X
      comment :
      none
      long_name :
      longitude
      standard_name :
      longitude
      units :
      degrees_east
      valid_max :
      180.0
      valid_min :
      -180.0
      array(-10., dtype=float32)
    • analysed_sst
      (time)
      float64
      dask.array<chunksize=(4225,), meta=np.ndarray>
      comment :
      "Final" version using Multi-Resolution Variational Analysis (MRVA) method for interpolation
      long_name :
      analysed sea surface temperature
      standard_name :
      sea_surface_foundation_temperature
      units :
      kelvin
      valid_max :
      32767
      valid_min :
      -32767
      Array Chunk
      Bytes 50.34 kiB 33.01 kiB
      Shape (6443,) (4225,)
      Dask graph 2 chunks in 3 graph layers
      Data type float64 numpy.ndarray
      6443 1
    • analysis_error
      (time)
      float64
      dask.array<chunksize=(4225,), meta=np.ndarray>
      comment :
      none
      long_name :
      estimated error standard deviation of analysed_sst
      units :
      kelvin
      valid_max :
      32767
      valid_min :
      0
      Array Chunk
      Bytes 50.34 kiB 33.01 kiB
      Shape (6443,) (4225,)
      Dask graph 2 chunks in 3 graph layers
      Data type float64 numpy.ndarray
      6443 1
    • mask
      (time)
      int8
      dask.array<chunksize=(6443,), meta=np.ndarray>
      comment :
      mask can be used to further filter the data.
      flag_masks :
      [1, 2, 4, 8, 16]
      flag_meanings :
      1=open-sea, 2=land, 5=open-lake, 9=open-sea with ice in the grid, 13=open-lake with ice in the grid
      flag_values :
      [1, 2, 5, 9, 13]
      long_name :
      sea/land field composite mask
      source :
      GMT "grdlandmask", ice flag from sea_ice_fraction data
      valid_max :
      31
      valid_min :
      1
      Array Chunk
      Bytes 6.29 kiB 6.29 kiB
      Shape (6443,) (6443,)
      Dask graph 1 chunks in 3 graph layers
      Data type int8 numpy.ndarray
      6443 1
    • sea_ice_fraction
      (time)
      float64
      dask.array<chunksize=(4225,), meta=np.ndarray>
      comment :
      ice data interpolated by a nearest neighbor approach.
      long_name :
      sea ice area fraction
      source :
      EUMETSAT OSI-SAF, copyright EUMETSAT
      standard_name :
      sea ice area fraction
      units :
      fraction (between 0 and 1)
      valid_max :
      100
      valid_min :
      0
      Array Chunk
      Bytes 50.34 kiB 33.01 kiB
      Shape (6443,) (4225,)
      Dask graph 2 chunks in 3 graph layers
      Data type float64 numpy.ndarray
      6443 1
  • Conventions :
    CF-1.7
    Metadata_Conventions :
    Unidata Observation Dataset v1.0
    acknowledgment :
    Please acknowledge the use of these data with the following statement: These data were provided by JPL under support by NASA MEaSUREs program.
    cdm_data_type :
    grid
    comment :
    MUR = "Multi-scale Ultra-high Resolution"
    creator_email :
    ghrsst@podaac.jpl.nasa.gov
    creator_name :
    JPL MUR SST project
    creator_url :
    http://mur.jpl.nasa.gov
    date_created :
    20200124T010755Z
    easternmost_longitude :
    180.0
    file_quality_level :
    3
    gds_version_id :
    2.0
    geospatial_lat_resolution :
    0.009999999776482582
    geospatial_lat_units :
    degrees north
    geospatial_lon_resolution :
    0.009999999776482582
    geospatial_lon_units :
    degrees east
    history :
    created at nominal 4-day latency; replaced nrt (1-day latency) version.
    id :
    MUR-JPL-L4-GLOB-v04.1
    institution :
    Jet Propulsion Laboratory
    keywords :
    Oceans > Ocean Temperature > Sea Surface Temperature
    keywords_vocabulary :
    NASA Global Change Master Directory (GCMD) Science Keywords
    license :
    These data are available free of charge under data policy of JPL PO.DAAC.
    metadata_link :
    http://podaac.jpl.nasa.gov/ws/metadata/dataset/?format=iso&shortName=MUR-JPL-L4-GLOB-v04.1
    naming_authority :
    org.ghrsst
    netcdf_version_id :
    4.1
    northernmost_latitude :
    90.0
    platform :
    Terra, Aqua, GCOM-W, MetOp-A, MetOp-B, Buoys/Ships
    processing_level :
    L4
    product_version :
    04.1
    project :
    NASA Making Earth Science Data Records for Use in Research Environments (MEaSUREs) Program
    publisher_email :
    ghrsst-po@nceo.ac.uk
    publisher_name :
    GHRSST Project Office
    publisher_url :
    http://www.ghrsst.org
    references :
    http://podaac.jpl.nasa.gov/Multi-scale_Ultra-high_Resolution_MUR-SST
    sensor :
    MODIS, AMSR2, AVHRR, in-situ
    source :
    MODIS_T-JPL, MODIS_A-JPL, AMSR2-REMSS, AVHRRMTA_G-NAVO, AVHRRMTB_G-NAVO, iQUAM-NOAA/NESDIS, Ice_Conc-OSISAF
    southernmost_latitude :
    -90.0
    spatial_resolution :
    0.01 degrees
    standard_name_vocabulary :
    NetCDF Climate and Forecast (CF) Metadata Convention
    start_time :
    20200116T090000Z
    stop_time :
    20200116T090000Z
    summary :
    A merged, multi-sensor L4 Foundation SST analysis product from JPL.
    time_coverage_end :
    20200116T210000Z
    time_coverage_start :
    20200115T210000Z
    title :
    Daily MUR SST, Final product
    uuid :
    27665bc0-d5fc-11e1-9b23-0800200c9a66
    westernmost_longitude :
    -180.0

We can select continuous segments using slice

In [12]:
ds_sst = ds_sst.sel(lon=slice(-20,-19),lat=slice(-10,-9))

We can mask values in our array using conditions based on the array values or coordinate values with .where()

In [13]:
# drop bad bands
ds_sst_2019 = ds_sst.where(ds_sst.time >= np.datetime64('2019-01-01'), drop=True)
ds_sst_2019
Out[13]:
<xarray.Dataset> Size: 110MB
Dimensions:           (time: 385, lat: 101, lon: 101)
Coordinates:
  * time              (time) datetime64[ns] 3kB 2019-01-01T09:00:00 ... 2020-...
  * lat               (lat) float32 404B -10.0 -9.99 -9.98 ... -9.02 -9.01 -9.0
  * lon               (lon) float32 404B -20.0 -19.99 -19.98 ... -19.01 -19.0
Data variables:
    analysed_sst      (time, lat, lon) float64 31MB dask.array<chunksize=(385, 2, 3), meta=np.ndarray>
    analysis_error    (time, lat, lon) float64 31MB dask.array<chunksize=(385, 2, 3), meta=np.ndarray>
    mask              (time, lat, lon) float32 16MB dask.array<chunksize=(385, 1, 1), meta=np.ndarray>
    sea_ice_fraction  (time, lat, lon) float64 31MB dask.array<chunksize=(385, 2, 3), meta=np.ndarray>
Attributes: (12/47)
    Conventions:                CF-1.7
    Metadata_Conventions:       Unidata Observation Dataset v1.0
    acknowledgment:             Please acknowledge the use of these data with...
    cdm_data_type:              grid
    comment:                    MUR = "Multi-scale Ultra-high Resolution"
    creator_email:              ghrsst@podaac.jpl.nasa.gov
    ...                         ...
    summary:                    A merged, multi-sensor L4 Foundation SST anal...
    time_coverage_end:          20200116T210000Z
    time_coverage_start:        20200115T210000Z
    title:                      Daily MUR SST, Final product
    uuid:                       27665bc0-d5fc-11e1-9b23-0800200c9a66
    westernmost_longitude:      -180.0
xarray.Dataset
    • time: 385
    • lat: 101
    • lon: 101
    • time
      (time)
      datetime64[ns]
      2019-01-01T09:00:00 ... 2020-01-...
      axis :
      T
      comment :
      Nominal time of analyzed fields
      long_name :
      reference time of sst field
      standard_name :
      time
      array(['2019-01-01T09:00:00.000000000', '2019-01-02T09:00:00.000000000',
             '2019-01-03T09:00:00.000000000', ..., '2020-01-18T09:00:00.000000000',
             '2020-01-19T09:00:00.000000000', '2020-01-20T09:00:00.000000000'],
            shape=(385,), dtype='datetime64[ns]')
    • lat
      (lat)
      float32
      -10.0 -9.99 -9.98 ... -9.01 -9.0
      axis :
      Y
      comment :
      none
      long_name :
      latitude
      standard_name :
      latitude
      units :
      degrees_north
      valid_max :
      90.0
      valid_min :
      -90.0
      array([-10.  ,  -9.99,  -9.98,  -9.97,  -9.96,  -9.95,  -9.94,  -9.93,  -9.92,
              -9.91,  -9.9 ,  -9.89,  -9.88,  -9.87,  -9.86,  -9.85,  -9.84,  -9.83,
              -9.82,  -9.81,  -9.8 ,  -9.79,  -9.78,  -9.77,  -9.76,  -9.75,  -9.74,
              -9.73,  -9.72,  -9.71,  -9.7 ,  -9.69,  -9.68,  -9.67,  -9.66,  -9.65,
              -9.64,  -9.63,  -9.62,  -9.61,  -9.6 ,  -9.59,  -9.58,  -9.57,  -9.56,
              -9.55,  -9.54,  -9.53,  -9.52,  -9.51,  -9.5 ,  -9.49,  -9.48,  -9.47,
              -9.46,  -9.45,  -9.44,  -9.43,  -9.42,  -9.41,  -9.4 ,  -9.39,  -9.38,
              -9.37,  -9.36,  -9.35,  -9.34,  -9.33,  -9.32,  -9.31,  -9.3 ,  -9.29,
              -9.28,  -9.27,  -9.26,  -9.25,  -9.24,  -9.23,  -9.22,  -9.21,  -9.2 ,
              -9.19,  -9.18,  -9.17,  -9.16,  -9.15,  -9.14,  -9.13,  -9.12,  -9.11,
              -9.1 ,  -9.09,  -9.08,  -9.07,  -9.06,  -9.05,  -9.04,  -9.03,  -9.02,
              -9.01,  -9.  ], dtype=float32)
    • lon
      (lon)
      float32
      -20.0 -19.99 ... -19.01 -19.0
      axis :
      X
      comment :
      none
      long_name :
      longitude
      standard_name :
      longitude
      units :
      degrees_east
      valid_max :
      180.0
      valid_min :
      -180.0
      array([-20.  , -19.99, -19.98, -19.97, -19.96, -19.95, -19.94, -19.93, -19.92,
             -19.91, -19.9 , -19.89, -19.88, -19.87, -19.86, -19.85, -19.84, -19.83,
             -19.82, -19.81, -19.8 , -19.79, -19.78, -19.77, -19.76, -19.75, -19.74,
             -19.73, -19.72, -19.71, -19.7 , -19.69, -19.68, -19.67, -19.66, -19.65,
             -19.64, -19.63, -19.62, -19.61, -19.6 , -19.59, -19.58, -19.57, -19.56,
             -19.55, -19.54, -19.53, -19.52, -19.51, -19.5 , -19.49, -19.48, -19.47,
             -19.46, -19.45, -19.44, -19.43, -19.42, -19.41, -19.4 , -19.39, -19.38,
             -19.37, -19.36, -19.35, -19.34, -19.33, -19.32, -19.31, -19.3 , -19.29,
             -19.28, -19.27, -19.26, -19.25, -19.24, -19.23, -19.22, -19.21, -19.2 ,
             -19.19, -19.18, -19.17, -19.16, -19.15, -19.14, -19.13, -19.12, -19.11,
             -19.1 , -19.09, -19.08, -19.07, -19.06, -19.05, -19.04, -19.03, -19.02,
             -19.01, -19.  ], dtype=float32)
    • analysed_sst
      (time, lat, lon)
      float64
      dask.array<chunksize=(385, 2, 3), meta=np.ndarray>
      comment :
      "Final" version using Multi-Resolution Variational Analysis (MRVA) method for interpolation
      long_name :
      analysed sea surface temperature
      standard_name :
      sea_surface_foundation_temperature
      units :
      kelvin
      valid_max :
      32767
      valid_min :
      -32767
      Array Chunk
      Bytes 29.96 MiB 11.66 MiB
      Shape (385, 101, 101) (385, 63, 63)
      Dask graph 9 chunks in 6 graph layers
      Data type float64 numpy.ndarray
      101 101 385
    • analysis_error
      (time, lat, lon)
      float64
      dask.array<chunksize=(385, 2, 3), meta=np.ndarray>
      comment :
      none
      long_name :
      estimated error standard deviation of analysed_sst
      units :
      kelvin
      valid_max :
      32767
      valid_min :
      0
      Array Chunk
      Bytes 29.96 MiB 11.66 MiB
      Shape (385, 101, 101) (385, 63, 63)
      Dask graph 9 chunks in 6 graph layers
      Data type float64 numpy.ndarray
      101 101 385
    • mask
      (time, lat, lon)
      float32
      dask.array<chunksize=(385, 1, 1), meta=np.ndarray>
      comment :
      mask can be used to further filter the data.
      flag_masks :
      [1, 2, 4, 8, 16]
      flag_meanings :
      1=open-sea, 2=land, 5=open-lake, 9=open-sea with ice in the grid, 13=open-lake with ice in the grid
      flag_values :
      [1, 2, 5, 9, 13]
      long_name :
      sea/land field composite mask
      source :
      GMT "grdlandmask", ice flag from sea_ice_fraction data
      valid_max :
      31
      valid_min :
      1
      Array Chunk
      Bytes 14.98 MiB 14.69 MiB
      Shape (385, 101, 101) (385, 100, 100)
      Dask graph 4 chunks in 7 graph layers
      Data type float32 numpy.ndarray
      101 101 385
    • sea_ice_fraction
      (time, lat, lon)
      float64
      dask.array<chunksize=(385, 2, 3), meta=np.ndarray>
      comment :
      ice data interpolated by a nearest neighbor approach.
      long_name :
      sea ice area fraction
      source :
      EUMETSAT OSI-SAF, copyright EUMETSAT
      standard_name :
      sea ice area fraction
      units :
      fraction (between 0 and 1)
      valid_max :
      100
      valid_min :
      0
      Array Chunk
      Bytes 29.96 MiB 11.66 MiB
      Shape (385, 101, 101) (385, 63, 63)
      Dask graph 9 chunks in 6 graph layers
      Data type float64 numpy.ndarray
      101 101 385
  • Conventions :
    CF-1.7
    Metadata_Conventions :
    Unidata Observation Dataset v1.0
    acknowledgment :
    Please acknowledge the use of these data with the following statement: These data were provided by JPL under support by NASA MEaSUREs program.
    cdm_data_type :
    grid
    comment :
    MUR = "Multi-scale Ultra-high Resolution"
    creator_email :
    ghrsst@podaac.jpl.nasa.gov
    creator_name :
    JPL MUR SST project
    creator_url :
    http://mur.jpl.nasa.gov
    date_created :
    20200124T010755Z
    easternmost_longitude :
    180.0
    file_quality_level :
    3
    gds_version_id :
    2.0
    geospatial_lat_resolution :
    0.009999999776482582
    geospatial_lat_units :
    degrees north
    geospatial_lon_resolution :
    0.009999999776482582
    geospatial_lon_units :
    degrees east
    history :
    created at nominal 4-day latency; replaced nrt (1-day latency) version.
    id :
    MUR-JPL-L4-GLOB-v04.1
    institution :
    Jet Propulsion Laboratory
    keywords :
    Oceans > Ocean Temperature > Sea Surface Temperature
    keywords_vocabulary :
    NASA Global Change Master Directory (GCMD) Science Keywords
    license :
    These data are available free of charge under data policy of JPL PO.DAAC.
    metadata_link :
    http://podaac.jpl.nasa.gov/ws/metadata/dataset/?format=iso&shortName=MUR-JPL-L4-GLOB-v04.1
    naming_authority :
    org.ghrsst
    netcdf_version_id :
    4.1
    northernmost_latitude :
    90.0
    platform :
    Terra, Aqua, GCOM-W, MetOp-A, MetOp-B, Buoys/Ships
    processing_level :
    L4
    product_version :
    04.1
    project :
    NASA Making Earth Science Data Records for Use in Research Environments (MEaSUREs) Program
    publisher_email :
    ghrsst-po@nceo.ac.uk
    publisher_name :
    GHRSST Project Office
    publisher_url :
    http://www.ghrsst.org
    references :
    http://podaac.jpl.nasa.gov/Multi-scale_Ultra-high_Resolution_MUR-SST
    sensor :
    MODIS, AMSR2, AVHRR, in-situ
    source :
    MODIS_T-JPL, MODIS_A-JPL, AMSR2-REMSS, AVHRRMTA_G-NAVO, AVHRRMTB_G-NAVO, iQUAM-NOAA/NESDIS, Ice_Conc-OSISAF
    southernmost_latitude :
    -90.0
    spatial_resolution :
    0.01 degrees
    standard_name_vocabulary :
    NetCDF Climate and Forecast (CF) Metadata Convention
    start_time :
    20200116T090000Z
    stop_time :
    20200116T090000Z
    summary :
    A merged, multi-sensor L4 Foundation SST analysis product from JPL.
    time_coverage_end :
    20200116T210000Z
    time_coverage_start :
    20200115T210000Z
    title :
    Daily MUR SST, Final product
    uuid :
    27665bc0-d5fc-11e1-9b23-0800200c9a66
    westernmost_longitude :
    -180.0

xarray has lots of functionality, and allows you to do most of the common operations you need for gridded data. For example grouping and aggregation:

In [14]:
ds_sst_2019 = ds_sst_2019.groupby('time.month').mean()
ds_sst_2019
Out[14]:
<xarray.Dataset> Size: 3MB
Dimensions:           (month: 12, lat: 101, lon: 101)
Coordinates:
  * month             (month) int64 96B 1 2 3 4 5 6 7 8 9 10 11 12
  * lat               (lat) float32 404B -10.0 -9.99 -9.98 ... -9.02 -9.01 -9.0
  * lon               (lon) float32 404B -20.0 -19.99 -19.98 ... -19.01 -19.0
Data variables:
    analysed_sst      (month, lat, lon) float64 979kB dask.array<chunksize=(1, 2, 3), meta=np.ndarray>
    analysis_error    (month, lat, lon) float64 979kB dask.array<chunksize=(1, 2, 3), meta=np.ndarray>
    mask              (month, lat, lon) float32 490kB dask.array<chunksize=(1, 1, 1), meta=np.ndarray>
    sea_ice_fraction  (month, lat, lon) float64 979kB dask.array<chunksize=(1, 2, 3), meta=np.ndarray>
Attributes: (12/47)
    Conventions:                CF-1.7
    Metadata_Conventions:       Unidata Observation Dataset v1.0
    acknowledgment:             Please acknowledge the use of these data with...
    cdm_data_type:              grid
    comment:                    MUR = "Multi-scale Ultra-high Resolution"
    creator_email:              ghrsst@podaac.jpl.nasa.gov
    ...                         ...
    summary:                    A merged, multi-sensor L4 Foundation SST anal...
    time_coverage_end:          20200116T210000Z
    time_coverage_start:        20200115T210000Z
    title:                      Daily MUR SST, Final product
    uuid:                       27665bc0-d5fc-11e1-9b23-0800200c9a66
    westernmost_longitude:      -180.0
xarray.Dataset
    • month: 12
    • lat: 101
    • lon: 101
    • month
      (month)
      int64
      1 2 3 4 5 6 7 8 9 10 11 12
      array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])
    • lat
      (lat)
      float32
      -10.0 -9.99 -9.98 ... -9.01 -9.0
      axis :
      Y
      comment :
      none
      long_name :
      latitude
      standard_name :
      latitude
      units :
      degrees_north
      valid_max :
      90.0
      valid_min :
      -90.0
      array([-10.  ,  -9.99,  -9.98,  -9.97,  -9.96,  -9.95,  -9.94,  -9.93,  -9.92,
              -9.91,  -9.9 ,  -9.89,  -9.88,  -9.87,  -9.86,  -9.85,  -9.84,  -9.83,
              -9.82,  -9.81,  -9.8 ,  -9.79,  -9.78,  -9.77,  -9.76,  -9.75,  -9.74,
              -9.73,  -9.72,  -9.71,  -9.7 ,  -9.69,  -9.68,  -9.67,  -9.66,  -9.65,
              -9.64,  -9.63,  -9.62,  -9.61,  -9.6 ,  -9.59,  -9.58,  -9.57,  -9.56,
              -9.55,  -9.54,  -9.53,  -9.52,  -9.51,  -9.5 ,  -9.49,  -9.48,  -9.47,
              -9.46,  -9.45,  -9.44,  -9.43,  -9.42,  -9.41,  -9.4 ,  -9.39,  -9.38,
              -9.37,  -9.36,  -9.35,  -9.34,  -9.33,  -9.32,  -9.31,  -9.3 ,  -9.29,
              -9.28,  -9.27,  -9.26,  -9.25,  -9.24,  -9.23,  -9.22,  -9.21,  -9.2 ,
              -9.19,  -9.18,  -9.17,  -9.16,  -9.15,  -9.14,  -9.13,  -9.12,  -9.11,
              -9.1 ,  -9.09,  -9.08,  -9.07,  -9.06,  -9.05,  -9.04,  -9.03,  -9.02,
              -9.01,  -9.  ], dtype=float32)
    • lon
      (lon)
      float32
      -20.0 -19.99 ... -19.01 -19.0
      axis :
      X
      comment :
      none
      long_name :
      longitude
      standard_name :
      longitude
      units :
      degrees_east
      valid_max :
      180.0
      valid_min :
      -180.0
      array([-20.  , -19.99, -19.98, -19.97, -19.96, -19.95, -19.94, -19.93, -19.92,
             -19.91, -19.9 , -19.89, -19.88, -19.87, -19.86, -19.85, -19.84, -19.83,
             -19.82, -19.81, -19.8 , -19.79, -19.78, -19.77, -19.76, -19.75, -19.74,
             -19.73, -19.72, -19.71, -19.7 , -19.69, -19.68, -19.67, -19.66, -19.65,
             -19.64, -19.63, -19.62, -19.61, -19.6 , -19.59, -19.58, -19.57, -19.56,
             -19.55, -19.54, -19.53, -19.52, -19.51, -19.5 , -19.49, -19.48, -19.47,
             -19.46, -19.45, -19.44, -19.43, -19.42, -19.41, -19.4 , -19.39, -19.38,
             -19.37, -19.36, -19.35, -19.34, -19.33, -19.32, -19.31, -19.3 , -19.29,
             -19.28, -19.27, -19.26, -19.25, -19.24, -19.23, -19.22, -19.21, -19.2 ,
             -19.19, -19.18, -19.17, -19.16, -19.15, -19.14, -19.13, -19.12, -19.11,
             -19.1 , -19.09, -19.08, -19.07, -19.06, -19.05, -19.04, -19.03, -19.02,
             -19.01, -19.  ], dtype=float32)
    • analysed_sst
      (month, lat, lon)
      float64
      dask.array<chunksize=(1, 2, 3), meta=np.ndarray>
      comment :
      "Final" version using Multi-Resolution Variational Analysis (MRVA) method for interpolation
      long_name :
      analysed sea surface temperature
      standard_name :
      sea_surface_foundation_temperature
      units :
      kelvin
      valid_max :
      32767
      valid_min :
      -32767
      Array Chunk
      Bytes 0.93 MiB 31.01 kiB
      Shape (12, 101, 101) (1, 63, 63)
      Dask graph 108 chunks in 55 graph layers
      Data type float64 numpy.ndarray
      101 101 12
    • analysis_error
      (month, lat, lon)
      float64
      dask.array<chunksize=(1, 2, 3), meta=np.ndarray>
      comment :
      none
      long_name :
      estimated error standard deviation of analysed_sst
      units :
      kelvin
      valid_max :
      32767
      valid_min :
      0
      Array Chunk
      Bytes 0.93 MiB 31.01 kiB
      Shape (12, 101, 101) (1, 63, 63)
      Dask graph 108 chunks in 55 graph layers
      Data type float64 numpy.ndarray
      101 101 12
    • mask
      (month, lat, lon)
      float32
      dask.array<chunksize=(1, 1, 1), meta=np.ndarray>
      comment :
      mask can be used to further filter the data.
      flag_masks :
      [1, 2, 4, 8, 16]
      flag_meanings :
      1=open-sea, 2=land, 5=open-lake, 9=open-sea with ice in the grid, 13=open-lake with ice in the grid
      flag_values :
      [1, 2, 5, 9, 13]
      long_name :
      sea/land field composite mask
      source :
      GMT "grdlandmask", ice flag from sea_ice_fraction data
      valid_max :
      31
      valid_min :
      1
      Array Chunk
      Bytes 478.17 kiB 39.06 kiB
      Shape (12, 101, 101) (1, 100, 100)
      Dask graph 48 chunks in 56 graph layers
      Data type float32 numpy.ndarray
      101 101 12
    • sea_ice_fraction
      (month, lat, lon)
      float64
      dask.array<chunksize=(1, 2, 3), meta=np.ndarray>
      comment :
      ice data interpolated by a nearest neighbor approach.
      long_name :
      sea ice area fraction
      source :
      EUMETSAT OSI-SAF, copyright EUMETSAT
      standard_name :
      sea ice area fraction
      units :
      fraction (between 0 and 1)
      valid_max :
      100
      valid_min :
      0
      Array Chunk
      Bytes 0.93 MiB 31.01 kiB
      Shape (12, 101, 101) (1, 63, 63)
      Dask graph 108 chunks in 55 graph layers
      Data type float64 numpy.ndarray
      101 101 12
  • Conventions :
    CF-1.7
    Metadata_Conventions :
    Unidata Observation Dataset v1.0
    acknowledgment :
    Please acknowledge the use of these data with the following statement: These data were provided by JPL under support by NASA MEaSUREs program.
    cdm_data_type :
    grid
    comment :
    MUR = "Multi-scale Ultra-high Resolution"
    creator_email :
    ghrsst@podaac.jpl.nasa.gov
    creator_name :
    JPL MUR SST project
    creator_url :
    http://mur.jpl.nasa.gov
    date_created :
    20200124T010755Z
    easternmost_longitude :
    180.0
    file_quality_level :
    3
    gds_version_id :
    2.0
    geospatial_lat_resolution :
    0.009999999776482582
    geospatial_lat_units :
    degrees north
    geospatial_lon_resolution :
    0.009999999776482582
    geospatial_lon_units :
    degrees east
    history :
    created at nominal 4-day latency; replaced nrt (1-day latency) version.
    id :
    MUR-JPL-L4-GLOB-v04.1
    institution :
    Jet Propulsion Laboratory
    keywords :
    Oceans > Ocean Temperature > Sea Surface Temperature
    keywords_vocabulary :
    NASA Global Change Master Directory (GCMD) Science Keywords
    license :
    These data are available free of charge under data policy of JPL PO.DAAC.
    metadata_link :
    http://podaac.jpl.nasa.gov/ws/metadata/dataset/?format=iso&shortName=MUR-JPL-L4-GLOB-v04.1
    naming_authority :
    org.ghrsst
    netcdf_version_id :
    4.1
    northernmost_latitude :
    90.0
    platform :
    Terra, Aqua, GCOM-W, MetOp-A, MetOp-B, Buoys/Ships
    processing_level :
    L4
    product_version :
    04.1
    project :
    NASA Making Earth Science Data Records for Use in Research Environments (MEaSUREs) Program
    publisher_email :
    ghrsst-po@nceo.ac.uk
    publisher_name :
    GHRSST Project Office
    publisher_url :
    http://www.ghrsst.org
    references :
    http://podaac.jpl.nasa.gov/Multi-scale_Ultra-high_Resolution_MUR-SST
    sensor :
    MODIS, AMSR2, AVHRR, in-situ
    source :
    MODIS_T-JPL, MODIS_A-JPL, AMSR2-REMSS, AVHRRMTA_G-NAVO, AVHRRMTB_G-NAVO, iQUAM-NOAA/NESDIS, Ice_Conc-OSISAF
    southernmost_latitude :
    -90.0
    spatial_resolution :
    0.01 degrees
    standard_name_vocabulary :
    NetCDF Climate and Forecast (CF) Metadata Convention
    start_time :
    20200116T090000Z
    stop_time :
    20200116T090000Z
    summary :
    A merged, multi-sensor L4 Foundation SST analysis product from JPL.
    time_coverage_end :
    20200116T210000Z
    time_coverage_start :
    20200115T210000Z
    title :
    Daily MUR SST, Final product
    uuid :
    27665bc0-d5fc-11e1-9b23-0800200c9a66
    westernmost_longitude :
    -180.0

⚠️ NOTE: You may notice that often it takes almost no time at all to run xarray code. This is because for many functions xarray does not load data from disk and actually perform the calculation, rather it simply prints a summary and high-level overview of the data that will be produced. This is called **Lazy computation and is the smart thing to do when working with large datasets. Only when you really need to do the calculation does it actually happen - like when calling .plot() or writing results. We can force computation by running .compute()**

In [15]:
with ProgressBar():
    ds_sst_2019 = ds_sst_2019.compute()
[########################################] | 100% Completed | 7.76 sms
In [16]:
ds_sst_2019
Out[16]:
<xarray.Dataset> Size: 3MB
Dimensions:           (month: 12, lat: 101, lon: 101)
Coordinates:
  * month             (month) int64 96B 1 2 3 4 5 6 7 8 9 10 11 12
  * lat               (lat) float32 404B -10.0 -9.99 -9.98 ... -9.02 -9.01 -9.0
  * lon               (lon) float32 404B -20.0 -19.99 -19.98 ... -19.01 -19.0
Data variables:
    analysed_sst      (month, lat, lon) float64 979kB 299.8 299.8 ... 299.3
    analysis_error    (month, lat, lon) float64 979kB 0.3698 0.3698 ... 0.3748
    mask              (month, lat, lon) float32 490kB 1.0 1.0 1.0 ... 1.0 1.0
    sea_ice_fraction  (month, lat, lon) float64 979kB -1.28 -1.28 ... -1.28
Attributes: (12/47)
    Conventions:                CF-1.7
    Metadata_Conventions:       Unidata Observation Dataset v1.0
    acknowledgment:             Please acknowledge the use of these data with...
    cdm_data_type:              grid
    comment:                    MUR = "Multi-scale Ultra-high Resolution"
    creator_email:              ghrsst@podaac.jpl.nasa.gov
    ...                         ...
    summary:                    A merged, multi-sensor L4 Foundation SST anal...
    time_coverage_end:          20200116T210000Z
    time_coverage_start:        20200115T210000Z
    title:                      Daily MUR SST, Final product
    uuid:                       27665bc0-d5fc-11e1-9b23-0800200c9a66
    westernmost_longitude:      -180.0
xarray.Dataset
    • month: 12
    • lat: 101
    • lon: 101
    • month
      (month)
      int64
      1 2 3 4 5 6 7 8 9 10 11 12
      array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])
    • lat
      (lat)
      float32
      -10.0 -9.99 -9.98 ... -9.01 -9.0
      axis :
      Y
      comment :
      none
      long_name :
      latitude
      standard_name :
      latitude
      units :
      degrees_north
      valid_max :
      90.0
      valid_min :
      -90.0
      array([-10.  ,  -9.99,  -9.98,  -9.97,  -9.96,  -9.95,  -9.94,  -9.93,  -9.92,
              -9.91,  -9.9 ,  -9.89,  -9.88,  -9.87,  -9.86,  -9.85,  -9.84,  -9.83,
              -9.82,  -9.81,  -9.8 ,  -9.79,  -9.78,  -9.77,  -9.76,  -9.75,  -9.74,
              -9.73,  -9.72,  -9.71,  -9.7 ,  -9.69,  -9.68,  -9.67,  -9.66,  -9.65,
              -9.64,  -9.63,  -9.62,  -9.61,  -9.6 ,  -9.59,  -9.58,  -9.57,  -9.56,
              -9.55,  -9.54,  -9.53,  -9.52,  -9.51,  -9.5 ,  -9.49,  -9.48,  -9.47,
              -9.46,  -9.45,  -9.44,  -9.43,  -9.42,  -9.41,  -9.4 ,  -9.39,  -9.38,
              -9.37,  -9.36,  -9.35,  -9.34,  -9.33,  -9.32,  -9.31,  -9.3 ,  -9.29,
              -9.28,  -9.27,  -9.26,  -9.25,  -9.24,  -9.23,  -9.22,  -9.21,  -9.2 ,
              -9.19,  -9.18,  -9.17,  -9.16,  -9.15,  -9.14,  -9.13,  -9.12,  -9.11,
              -9.1 ,  -9.09,  -9.08,  -9.07,  -9.06,  -9.05,  -9.04,  -9.03,  -9.02,
              -9.01,  -9.  ], dtype=float32)
    • lon
      (lon)
      float32
      -20.0 -19.99 ... -19.01 -19.0
      axis :
      X
      comment :
      none
      long_name :
      longitude
      standard_name :
      longitude
      units :
      degrees_east
      valid_max :
      180.0
      valid_min :
      -180.0
      array([-20.  , -19.99, -19.98, -19.97, -19.96, -19.95, -19.94, -19.93, -19.92,
             -19.91, -19.9 , -19.89, -19.88, -19.87, -19.86, -19.85, -19.84, -19.83,
             -19.82, -19.81, -19.8 , -19.79, -19.78, -19.77, -19.76, -19.75, -19.74,
             -19.73, -19.72, -19.71, -19.7 , -19.69, -19.68, -19.67, -19.66, -19.65,
             -19.64, -19.63, -19.62, -19.61, -19.6 , -19.59, -19.58, -19.57, -19.56,
             -19.55, -19.54, -19.53, -19.52, -19.51, -19.5 , -19.49, -19.48, -19.47,
             -19.46, -19.45, -19.44, -19.43, -19.42, -19.41, -19.4 , -19.39, -19.38,
             -19.37, -19.36, -19.35, -19.34, -19.33, -19.32, -19.31, -19.3 , -19.29,
             -19.28, -19.27, -19.26, -19.25, -19.24, -19.23, -19.22, -19.21, -19.2 ,
             -19.19, -19.18, -19.17, -19.16, -19.15, -19.14, -19.13, -19.12, -19.11,
             -19.1 , -19.09, -19.08, -19.07, -19.06, -19.05, -19.04, -19.03, -19.02,
             -19.01, -19.  ], dtype=float32)
    • analysed_sst
      (month, lat, lon)
      float64
      299.8 299.8 299.8 ... 299.3 299.3
      comment :
      "Final" version using Multi-Resolution Variational Analysis (MRVA) method for interpolation
      long_name :
      analysed sea surface temperature
      standard_name :
      sea_surface_foundation_temperature
      units :
      kelvin
      valid_max :
      32767
      valid_min :
      -32767
      array([[[299.82454902, 299.82360784, 299.82219608, ..., 299.78843137,
               299.79490196, 299.79819608],
              [299.8255098 , 299.82339216, 299.82078431, ..., 299.78772549,
               299.79390196, 299.79713725],
              [299.82588235, 299.82307843, 299.81954902, ..., 299.78739216,
               299.79190196, 299.79417647],
              ...,
              [300.0554902 , 300.05292157, 300.04988235, ..., 300.00594118,
               299.9995098 , 299.99164706],
              [300.05482353, 300.05188235, 300.04933333, ..., 300.01317647,
               300.0052549 , 299.99601961],
              [300.05954902, 300.05564706, 300.05266667, ..., 300.01780392,
               300.01017647, 300.00103922]],
      
             [[300.56110714, 300.56175   , 300.56046429, ..., 300.37010714,
               300.37360714, 300.37846429],
              [300.558     , 300.55635714, 300.55410714, ..., 300.37946429,
               300.38514286, 300.39135714],
              [300.55810714, 300.55457143, 300.55167857, ..., 300.38778571,
               300.3955    , 300.40342857],
      ...
              [298.44446667, 298.43236667, 298.42213333, ..., 298.35563333,
               298.35463333, 298.35263333],
              [298.44606667, 298.4366    , 298.42763333, ..., 298.35933333,
               298.3577    , 298.35416667],
              [298.45023333, 298.4412    , 298.43256667, ..., 298.3589    ,
               298.35763333, 298.3544    ]],
      
             [[299.31729032, 299.31367742, 299.30922581, ..., 299.07770968,
               299.08006452, 299.08158065],
              [299.31458065, 299.31041935, 299.30606452, ..., 299.07941935,
               299.0806129 , 299.081     ],
              [299.3106129 , 299.30606452, 299.30248387, ..., 299.08009677,
               299.07980645, 299.079     ],
              ...,
              [299.38403226, 299.37996774, 299.37554839, ..., 299.34612903,
               299.34851613, 299.35006452],
              [299.38341935, 299.37909677, 299.3753871 , ..., 299.34332258,
               299.34606452, 299.34832258],
              [299.38187097, 299.37709677, 299.37390323, ..., 299.34206452,
               299.34490323, 299.34751613]]], shape=(12, 101, 101))
    • analysis_error
      (month, lat, lon)
      float64
      0.3698 0.3698 ... 0.3748 0.3748
      comment :
      none
      long_name :
      estimated error standard deviation of analysed_sst
      units :
      kelvin
      valid_max :
      32767
      valid_min :
      0
      array([[[0.36980392, 0.36980392, 0.36980392, ..., 0.36941176,
               0.36941176, 0.36921569],
              [0.36980392, 0.36980392, 0.36980392, ..., 0.36941176,
               0.36941176, 0.36921569],
              [0.36980392, 0.36980392, 0.36980392, ..., 0.36901961,
               0.36921569, 0.36921569],
              ...,
              [0.36921569, 0.36921569, 0.36941176, ..., 0.36901961,
               0.36901961, 0.36901961],
              [0.36941176, 0.36960784, 0.36921569, ..., 0.36901961,
               0.36901961, 0.36921569],
              [0.36960784, 0.36941176, 0.36921569, ..., 0.36901961,
               0.36921569, 0.36921569]],
      
             [[0.36678571, 0.36678571, 0.36678571, ..., 0.36642857,
               0.36642857, 0.36714286],
              [0.36678571, 0.36678571, 0.36678571, ..., 0.36607143,
               0.36607143, 0.36642857],
              [0.36678571, 0.36678571, 0.36678571, ..., 0.36607143,
               0.36642857, 0.36642857],
      ...
              [0.37166667, 0.37166667, 0.37166667, ..., 0.37433333,
               0.37433333, 0.374     ],
              [0.37166667, 0.372     , 0.372     , ..., 0.37466667,
               0.375     , 0.37466667],
              [0.37233333, 0.372     , 0.372     , ..., 0.37466667,
               0.375     , 0.375     ]],
      
             [[0.37322581, 0.37322581, 0.37322581, ..., 0.37806452,
               0.37806452, 0.37806452],
              [0.37354839, 0.37322581, 0.37322581, ..., 0.37741935,
               0.37741935, 0.37806452],
              [0.37322581, 0.37322581, 0.37322581, ..., 0.37774194,
               0.37741935, 0.37741935],
              ...,
              [0.37483871, 0.37516129, 0.37516129, ..., 0.37451613,
               0.37451613, 0.37516129],
              [0.37516129, 0.37516129, 0.37516129, ..., 0.37451613,
               0.37483871, 0.37516129],
              [0.37516129, 0.37516129, 0.37516129, ..., 0.37483871,
               0.37483871, 0.37483871]]], shape=(12, 101, 101))
    • mask
      (month, lat, lon)
      float32
      1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
      comment :
      mask can be used to further filter the data.
      flag_masks :
      [1, 2, 4, 8, 16]
      flag_meanings :
      1=open-sea, 2=land, 5=open-lake, 9=open-sea with ice in the grid, 13=open-lake with ice in the grid
      flag_values :
      [1, 2, 5, 9, 13]
      long_name :
      sea/land field composite mask
      source :
      GMT "grdlandmask", ice flag from sea_ice_fraction data
      valid_max :
      31
      valid_min :
      1
      array([[[1., 1., 1., ..., 1., 1., 1.],
              [1., 1., 1., ..., 1., 1., 1.],
              [1., 1., 1., ..., 1., 1., 1.],
              ...,
              [1., 1., 1., ..., 1., 1., 1.],
              [1., 1., 1., ..., 1., 1., 1.],
              [1., 1., 1., ..., 1., 1., 1.]],
      
             [[1., 1., 1., ..., 1., 1., 1.],
              [1., 1., 1., ..., 1., 1., 1.],
              [1., 1., 1., ..., 1., 1., 1.],
              ...,
              [1., 1., 1., ..., 1., 1., 1.],
              [1., 1., 1., ..., 1., 1., 1.],
              [1., 1., 1., ..., 1., 1., 1.]],
      
             [[1., 1., 1., ..., 1., 1., 1.],
              [1., 1., 1., ..., 1., 1., 1.],
              [1., 1., 1., ..., 1., 1., 1.],
              ...,
      ...
              [1., 1., 1., ..., 1., 1., 1.],
              [1., 1., 1., ..., 1., 1., 1.],
              [1., 1., 1., ..., 1., 1., 1.]],
      
             [[1., 1., 1., ..., 1., 1., 1.],
              [1., 1., 1., ..., 1., 1., 1.],
              [1., 1., 1., ..., 1., 1., 1.],
              ...,
              [1., 1., 1., ..., 1., 1., 1.],
              [1., 1., 1., ..., 1., 1., 1.],
              [1., 1., 1., ..., 1., 1., 1.]],
      
             [[1., 1., 1., ..., 1., 1., 1.],
              [1., 1., 1., ..., 1., 1., 1.],
              [1., 1., 1., ..., 1., 1., 1.],
              ...,
              [1., 1., 1., ..., 1., 1., 1.],
              [1., 1., 1., ..., 1., 1., 1.],
              [1., 1., 1., ..., 1., 1., 1.]]],
            shape=(12, 101, 101), dtype=float32)
    • sea_ice_fraction
      (month, lat, lon)
      float64
      -1.28 -1.28 -1.28 ... -1.28 -1.28
      comment :
      ice data interpolated by a nearest neighbor approach.
      long_name :
      sea ice area fraction
      source :
      EUMETSAT OSI-SAF, copyright EUMETSAT
      standard_name :
      sea ice area fraction
      units :
      fraction (between 0 and 1)
      valid_max :
      100
      valid_min :
      0
      array([[[-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              ...,
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28]],
      
             [[-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              ...,
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28]],
      
             [[-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              ...,
      ...
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28]],
      
             [[-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              ...,
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28]],
      
             [[-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              ...,
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28],
              [-1.28, -1.28, -1.28, ..., -1.28, -1.28, -1.28]]],
            shape=(12, 101, 101))
  • Conventions :
    CF-1.7
    Metadata_Conventions :
    Unidata Observation Dataset v1.0
    acknowledgment :
    Please acknowledge the use of these data with the following statement: These data were provided by JPL under support by NASA MEaSUREs program.
    cdm_data_type :
    grid
    comment :
    MUR = "Multi-scale Ultra-high Resolution"
    creator_email :
    ghrsst@podaac.jpl.nasa.gov
    creator_name :
    JPL MUR SST project
    creator_url :
    http://mur.jpl.nasa.gov
    date_created :
    20200124T010755Z
    easternmost_longitude :
    180.0
    file_quality_level :
    3
    gds_version_id :
    2.0
    geospatial_lat_resolution :
    0.009999999776482582
    geospatial_lat_units :
    degrees north
    geospatial_lon_resolution :
    0.009999999776482582
    geospatial_lon_units :
    degrees east
    history :
    created at nominal 4-day latency; replaced nrt (1-day latency) version.
    id :
    MUR-JPL-L4-GLOB-v04.1
    institution :
    Jet Propulsion Laboratory
    keywords :
    Oceans > Ocean Temperature > Sea Surface Temperature
    keywords_vocabulary :
    NASA Global Change Master Directory (GCMD) Science Keywords
    license :
    These data are available free of charge under data policy of JPL PO.DAAC.
    metadata_link :
    http://podaac.jpl.nasa.gov/ws/metadata/dataset/?format=iso&shortName=MUR-JPL-L4-GLOB-v04.1
    naming_authority :
    org.ghrsst
    netcdf_version_id :
    4.1
    northernmost_latitude :
    90.0
    platform :
    Terra, Aqua, GCOM-W, MetOp-A, MetOp-B, Buoys/Ships
    processing_level :
    L4
    product_version :
    04.1
    project :
    NASA Making Earth Science Data Records for Use in Research Environments (MEaSUREs) Program
    publisher_email :
    ghrsst-po@nceo.ac.uk
    publisher_name :
    GHRSST Project Office
    publisher_url :
    http://www.ghrsst.org
    references :
    http://podaac.jpl.nasa.gov/Multi-scale_Ultra-high_Resolution_MUR-SST
    sensor :
    MODIS, AMSR2, AVHRR, in-situ
    source :
    MODIS_T-JPL, MODIS_A-JPL, AMSR2-REMSS, AVHRRMTA_G-NAVO, AVHRRMTB_G-NAVO, iQUAM-NOAA/NESDIS, Ice_Conc-OSISAF
    southernmost_latitude :
    -90.0
    spatial_resolution :
    0.01 degrees
    standard_name_vocabulary :
    NetCDF Climate and Forecast (CF) Metadata Convention
    start_time :
    20200116T090000Z
    stop_time :
    20200116T090000Z
    summary :
    A merged, multi-sensor L4 Foundation SST analysis product from JPL.
    time_coverage_end :
    20200116T210000Z
    time_coverage_start :
    20200115T210000Z
    title :
    Daily MUR SST, Final product
    uuid :
    27665bc0-d5fc-11e1-9b23-0800200c9a66
    westernmost_longitude :
    -180.0

Step 4. Make xarray geospatial with rioxarray¶

Although we have latitude and longitude values associated with our Xarray, this data is not a proper geospatial dataset and hence we cannot do spatial manipulations like calculating distances or reprojecting. Xarray is a general-purpose tool for any multidimensional data and is not specific to geospatial data. We need an additional package rioxarray which brings all of the power of GDAL to Xarrays. rioxarray extends Xarray with the rio accessor. What this means is that a bunch of new functions become available to Xarray instances by typing .rio. It also allows us to open geospatial datasets, like geotiffs using xr.open_dataset(...,engine='rasterio')

In [17]:
import rioxarray

We can load a cloud-optimized geotiff stored on AWS by directly providing the url to the file location. This specific file is a single band from the Sentinel 2 satellite

In [18]:
s2 = xr.open_dataset('https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/34/H/CH/2018/9/S2A_34HCH_20180923_0_L2A/B08.tif',
                     engine='rasterio',
                     chunks='auto')
s2
Out[18]:
<xarray.Dataset> Size: 482MB
Dimensions:      (band: 1, x: 10980, y: 10980)
Coordinates:
  * band         (band) int64 8B 1
  * x            (x) float64 88kB 3e+05 3e+05 3e+05 ... 4.098e+05 4.098e+05
  * y            (y) float64 88kB 6.3e+06 6.3e+06 6.3e+06 ... 6.19e+06 6.19e+06
    spatial_ref  int64 8B ...
Data variables:
    band_data    (band, y, x) float32 482MB dask.array<chunksize=(1, 5120, 5120), meta=np.ndarray>
xarray.Dataset
    • band: 1
    • x: 10980
    • y: 10980
    • band
      (band)
      int64
      1
      array([1])
    • x
      (x)
      float64
      3e+05 3e+05 ... 4.098e+05 4.098e+05
      array([300005., 300015., 300025., ..., 409775., 409785., 409795.],
            shape=(10980,))
    • y
      (y)
      float64
      6.3e+06 6.3e+06 ... 6.19e+06
      array([6300035., 6300025., 6300015., ..., 6190265., 6190255., 6190245.],
            shape=(10980,))
    • spatial_ref
      ()
      int64
      ...
      crs_wkt :
      PROJCS["WGS 84 / UTM zone 34S",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",21],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",10000000],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32734"]]
      semi_major_axis :
      6378137.0
      semi_minor_axis :
      6356752.314245179
      inverse_flattening :
      298.257223563
      reference_ellipsoid_name :
      WGS 84
      longitude_of_prime_meridian :
      0.0
      prime_meridian_name :
      Greenwich
      geographic_crs_name :
      WGS 84
      horizontal_datum_name :
      World Geodetic System 1984
      projected_crs_name :
      WGS 84 / UTM zone 34S
      grid_mapping_name :
      transverse_mercator
      latitude_of_projection_origin :
      0.0
      longitude_of_central_meridian :
      21.0
      false_easting :
      500000.0
      false_northing :
      10000000.0
      scale_factor_at_central_meridian :
      0.9996
      spatial_ref :
      PROJCS["WGS 84 / UTM zone 34S",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",21],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",10000000],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32734"]]
      GeoTransform :
      300000.0 10.0 0.0 6300040.0 0.0 -10.0
      [1 values with dtype=int64]
    • band_data
      (band, y, x)
      float32
      dask.array<chunksize=(1, 5120, 5120), meta=np.ndarray>
      OVR_RESAMPLING_ALG :
      AVERAGE
      AREA_OR_POINT :
      Area
      Array Chunk
      Bytes 459.90 MiB 100.00 MiB
      Shape (1, 10980, 10980) (1, 5120, 5120)
      Dask graph 9 chunks in 2 graph layers
      Data type float32 numpy.ndarray
      10980 10980 1

Beacsue this file has projection information associated with it, we can perform geospatial opeations on it, like .clip()

In [19]:
geometries = [
    {
        'type': 'Polygon',
        'coordinates': [[
            [300115, 6250015],
            [310415, 6260015],
            [320815, 6260015],
            [310415, 6250015],
            [300215, 6240015]
        ]]
    }
]
clipped = s2.rio.clip(geometries)
clipped
Out[19]:
<xarray.Dataset> Size: 17MB
Dimensions:      (band: 1, x: 2070, y: 2000)
Coordinates:
  * band         (band) int64 8B 1
  * x            (x) float64 17kB 3.001e+05 3.001e+05 ... 3.208e+05 3.208e+05
  * y            (y) float64 16kB 6.26e+06 6.26e+06 ... 6.24e+06 6.24e+06
    spatial_ref  int64 8B 0
Data variables:
    band_data    (band, y, x) float32 17MB dask.array<chunksize=(1, 1118, 2070), meta=np.ndarray>
xarray.Dataset
    • band: 1
    • x: 2070
    • y: 2000
    • band
      (band)
      int64
      1
      array([1])
    • x
      (x)
      float64
      3.001e+05 3.001e+05 ... 3.208e+05
      axis :
      X
      long_name :
      x coordinate of projection
      standard_name :
      projection_x_coordinate
      units :
      metre
      array([300125., 300135., 300145., ..., 320795., 320805., 320815.],
            shape=(2070,))
    • y
      (y)
      float64
      6.26e+06 6.26e+06 ... 6.24e+06
      axis :
      Y
      long_name :
      y coordinate of projection
      standard_name :
      projection_y_coordinate
      units :
      metre
      array([6260015., 6260005., 6259995., ..., 6240045., 6240035., 6240025.],
            shape=(2000,))
    • spatial_ref
      ()
      int64
      0
      crs_wkt :
      PROJCS["WGS 84 / UTM zone 34S",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",21],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",10000000],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32734"]]
      semi_major_axis :
      6378137.0
      semi_minor_axis :
      6356752.314245179
      inverse_flattening :
      298.257223563
      reference_ellipsoid_name :
      WGS 84
      longitude_of_prime_meridian :
      0.0
      prime_meridian_name :
      Greenwich
      geographic_crs_name :
      WGS 84
      horizontal_datum_name :
      World Geodetic System 1984
      projected_crs_name :
      WGS 84 / UTM zone 34S
      grid_mapping_name :
      transverse_mercator
      latitude_of_projection_origin :
      0.0
      longitude_of_central_meridian :
      21.0
      false_easting :
      500000.0
      false_northing :
      10000000.0
      scale_factor_at_central_meridian :
      0.9996
      spatial_ref :
      PROJCS["WGS 84 / UTM zone 34S",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",21],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",10000000],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32734"]]
      GeoTransform :
      300120.0 10.0 0.0 6260020.0 0.0 -10.0
      array(0)
    • band_data
      (band, y, x)
      float32
      dask.array<chunksize=(1, 1118, 2070), meta=np.ndarray>
      OVR_RESAMPLING_ALG :
      AVERAGE
      AREA_OR_POINT :
      Area
      Array Chunk
      Bytes 15.79 MiB 8.83 MiB
      Shape (1, 2000, 2070) (1, 1118, 2070)
      Dask graph 2 chunks in 6 graph layers
      Data type float32 numpy.ndarray
      2070 2000 1

or reproject

In [20]:
clipped = clipped.rio.reproject('epsg:4326')
clipped
Out[20]:
<xarray.Dataset> Size: 17MB
Dimensions:      (x: 2292, y: 1850, band: 1)
Coordinates:
  * x            (x) float64 18kB 18.84 18.84 18.84 18.84 ... 19.06 19.06 19.06
  * y            (y) float64 15kB -33.78 -33.78 -33.78 ... -33.96 -33.96 -33.97
  * band         (band) int64 8B 1
    spatial_ref  int64 8B 0
Data variables:
    band_data    (band, y, x) float32 17MB nan nan nan nan ... nan nan nan nan
xarray.Dataset
    • x: 2292
    • y: 1850
    • band: 1
    • x
      (x)
      float64
      18.84 18.84 18.84 ... 19.06 19.06
      axis :
      X
      long_name :
      longitude
      standard_name :
      longitude
      units :
      degrees_east
      array([18.836798, 18.836898, 18.836997, ..., 19.064485, 19.064585, 19.064684],
            shape=(2292,))
    • y
      (y)
      float64
      -33.78 -33.78 ... -33.96 -33.97
      axis :
      Y
      long_name :
      latitude
      standard_name :
      latitude
      units :
      degrees_north
      array([-33.781155, -33.781254, -33.781354, ..., -33.964876, -33.964975,
             -33.965075], shape=(1850,))
    • band
      (band)
      int64
      1
      array([1])
    • spatial_ref
      ()
      int64
      0
      crs_wkt :
      GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]
      semi_major_axis :
      6378137.0
      semi_minor_axis :
      6356752.314245179
      inverse_flattening :
      298.257223563
      reference_ellipsoid_name :
      WGS 84
      longitude_of_prime_meridian :
      0.0
      prime_meridian_name :
      Greenwich
      geographic_crs_name :
      WGS 84
      horizontal_datum_name :
      World Geodetic System 1984
      grid_mapping_name :
      latitude_longitude
      spatial_ref :
      GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]
      GeoTransform :
      18.836748419453627 9.947004038814279e-05 0.0 -33.78110498211115 0.0 -9.947004038814279e-05
      array(0)
    • band_data
      (band, y, x)
      float32
      nan nan nan nan ... nan nan nan nan
      OVR_RESAMPLING_ALG :
      AVERAGE
      AREA_OR_POINT :
      Area
      array([[[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]]],
            shape=(1, 1850, 2292), dtype=float32)

We can also plot it on top of other spatial data. Here we will overlay it with a satellite basemap. We will use the package hvplot to make this plot interactive, allowing us to pan and zoom.

In [21]:
#plotting
import hvplot.xarray
import holoviews as hv
hvplot.extension('bokeh')
In [ ]:
#plot with a satellite basemap
clip_plot = clipped['band_data'].isel(band=0)

#plot
clip_plot.hvplot(tiles=hv.element.tiles.EsriImagery(), 
                              project=True,clim=(1,10000),
                              cmap='magma',frame_width=800,data_aspect=1,alpha=0.7,title='Sentinel 2 near-infrared')
Out[ ]:

Finally, if you want to save your file to a cloud-optmized GeoTIFF, you can use the .rio.to_raster() method and specify COG as the driver parameter

⚠️ NOTE: THE FOLLOWING CODE BLOCK SAVES THE RASTER TO A CLOUD-OPTIMIZED GEOTIFFS IN S3. FOR SECURITY REASONS, WE HAVE REMOVED 'WRITE' ACCESS TO THE PUBLIC BUCKET USED IN THIS TRAINING. YOU CAN USE THIS CODE TO SAVE TO YOUR OWN S3 BUCKET WITHIN TNC's AWS ACCOUNT

In [23]:
import os
# This error occurs because GDAL's /vsis3/ handler does not support random write operations (like updating existing files or specific GeoTIFF writing patterns) by default. 
# To allow this, you must instruct GDAL to use a local temporary file for staging the write before uploading to S3
# Set the environment variable
os.environ["CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE"] = "YES"
#s3_cog_path = "/vsis3/your-bucket-name/path/to/output_cog.tif"
s3_path = "/vsis3/ocs-training-2026/advanced/s2_nir_cog.tif"
clip_plot.rio.to_raster(raster_path=s3_path, driver="COG")

Step 5. Data search and discovery¶

How can we find and discover datasets stored in the cloud? To address this challenge, a number of metadata standards have been developed to help organize and document datasets, making them easier to find, query, and use.

These metadata standards serve as structured descriptions of data, allowing us to efficiently search through catalogs and discover datasets that meet specific criteria, such as geographic location, time range, or data type. Two widely used examples of these metadata standards are STAC (SpatioTemporal Asset Catalog) and intake.

  • STAC (SpatioTemporal Asset Catalog) is a specification designed for geospatial data, such as satellite imagery or climate model outputs. It organizes data by providing a uniform structure to describe assets (e.g., satellite images) with spatial and temporal metadata. This enables users to efficiently search for and filter relevant data across large, distributed datasets in the cloud.

  • intake is a more general-purpose library that helps manage and access data catalogs in Python. It supports various data formats and types, allowing users to interact with both local and remote datasets in a unified way. With intake, we can browse, search, and load datasets without needing to worry about their underlying storage format or location.

These tools and standards empower users to seamlessly navigate through vast amounts of cloud-hosted data and extract just what is needed for their analyses. By leveraging these metadata-driven catalogs, we can make cloud data discovery efficient, even when dealing with enormous and complex datasets.

Even with robust metadata standards like STAC and intake, we still need to know where to find the catalogs that contain the datasets we’re interested in. This can sometimes be a challenging task, especially given the vast amount of data available across different platforms and cloud providers. However, there are several excellent starting points for discovering cloud-hosted datasets, particularly those related to Earth observation, geospatial analysis, and open data.

Some key resources include:

  • AWS Earth: Amazon Web Services (AWS) hosts a wide range of Earth observation data, making it accessible for analysis in the cloud. The AWS Earth page highlights various datasets related to satellite imagery, weather, and environmental monitoring. It also includes case studies and tools for working with these datasets. This is a great resource for those seeking publicly accessible datasets related to Earth science.

  • AWS Open Data Registry: AWS maintains an extensive Open Data Registry, which catalogs a wide variety of public datasets across different fields, including geospatial data, climate science, genomics, and more. The registry provides detailed information about each dataset, including links to the data on AWS S3, metadata, and documentation. This resource is particularly useful for discovering datasets that are freely accessible for cloud-based analysis.

  • NASA Earthdata: NASA Earthdata provides access to a vast collection of Earth science data, particularly those collected by NASA's satellites and field measurement programs. The platform offers powerful search tools, including the Earthdata Search tool, which allows users to filter and download datasets based on specific criteria like spatial and temporal coverage, data type, and more. NASA Earthdata is a go-to source for anyone working on climate, weather, land cover, and atmospheric studies, with extensive documentation and tutorials available to help users get started. Much of NASAs data is already on AWS, so using it is simply a matter of finding the url for the data you want.

  • Radiant Earth STAC Browser: The STAC Browser is an interactive web-based tool that allows users to browse STAC-compliant datasets. It provides a user-friendly interface to search for geospatial datasets cataloged using the STAC standard. Radiant Earth is focused on providing open geospatial data for machine learning and Earth observation applications, making this a valuable resource for researchers in these fields.

Despite these resources, discovering the right datasets can still require some trial and error, especially when dealing with specialized or niche datasets. It's important to explore these platforms, understand the types of data available, and take advantage of the metadata standards and search tools they provide to refine your search.

As cloud data storage grows and standards evolve, the process of discovering and accessing large, cloud-hosted datasets will continue to improve, making it easier to find the data you need for complex analyses.

Conclusions¶

We've demonstrated the following concept in this workbook:

  • Use xarray to efficiently run operations on multidimensional gridded datasets
  • Load data from a public cloud data repository such as data stored in AWS S3
  • Use rioxarray to add geospatial capabilities and spatial data operations to your xarray dataset
  • Take advantage of several cloud data catalogs to pull data into your workflows and avoid downloads

Additional Resources¶

Great places to learn more about working with gridded data in python:

  • The Carpentries Geospatial Python lesson by Ryan Avery
  • The xarray user guide
  • An Introduction to Earth and Environmental Data Science
  • AWS Skill Builder: This training portal provided by AWS contains self-paced training modules for all of AWS' cloud storage and compute services. While many of the courses are behind a paywall, many of the introductory courses are free to access. Use the web application's filtering function to focus your search, for example to Free courses of the Fundamental skill level focused on Data analytics